Introduction

Column

Tennis and Its Origins

  • A racket sport, which can be played in singles or doubles.

  • Stringed tennis racquets are used by each player to hit a hollow rubber ball with a felt covering over a net and into the opposing court.

  • The earliest known version of tennis was a handball game practiced in France around the 12th century, called palm (Castorino n.d.).

Governing Bodies and Types of Courts

  • Three main governing bodies, namely the Association of Tennis Professionals (ATP), the Women’s Tennis Association (WTA) and the International Tennis Federation (ITF) (Score and Change n.d.).

  • Hard, clay, grass, carpet court and so on.

  • Court characteristics then allow for the production of many different types of balls, low to high bouncing ones and slow to fast ones.

  • Many ball producers, some of the famous ones being Dunlop, Wilson, Prince and so on.

Grand Slam Tournaments

  • Australian Open in January, the French Open from late May to early June, Wimbledon in late June to early July, and the US Open in August to September.

  • The Australian and US open take place on hard courts, whereas the French Open and Wimbledon take place on clay and grass courts respectively.

Scoring System

  • Tennis matches are often broken up into sets, with each set consisting of many games (USTA n.d.).

  • The most common scoring system for professional matches including Grand Slams is best of three for women and best of five for men.

  • To win a set, players typically need to win six games, where they can earn points, progressing from 0, 15, 30, and 40.

  • If there is no Deuce and a player passes 40 points, he/she wins that one particular game.

Column

Introductory Diagram

Tennis Court, Racquet and Ball

Courts’ Diagram

Grand Slam Courts and Their Type of Surfaces

Data Set and Objectives

Column

Introduction to the Data Set

  • The “WTA/ATP Tennis” data set utilised in this study was obtained from Kaggle (2021), in which it comprises of player information, WTA and ATP statistics and results for matches from 1949-2021.

  • Notably, the data was originally collected by Jeff Sackman (2023) and should be attributed by anyone who wants to use this data set in the future.

  • Bear in mind that starting in 1968, tennis was referred to as the Open Era as it was decided that both professionals and amateurs would be able to compete in Grand Slam events (Tennis Companion n.d.), discarding any divisions that previously existed.

  • I would also like to clarify that plots number 7 to 10 in this documentary are inspired by plots from this link.

  • However, I have made changes here and there such as utilising different years or variables, and visually representing them differently according to my preferences and what I feel is best.

  • There are two files, namely KaggleMatches.csv and KagglePlayers.csv, containing the match and player information respectively.

  • The matches’ data set consists of 50 columns, while the players’ data set consists of 7 columns.

Matches’ Data Set

Players’ Data Set

Objectives

The three main objectives of this documentary are:

  1. To study the distribution of player nationalities along with the winning and losing percentages of players from each country.
  2. To investigate court and player characteristics over the years and how they influence the game.
  3. To discover the top players and their performances based on Grand Slam and non-Grand Slam titles.

Analysis

Putting My Country’s Name Out There!


  • The size of the tiles represent the number of tennis players hailing from that particular country. This involves all players from 1949 to 2021.
  • There are a whooping 14,518 players from the United States of America.
  • This is then followed by 5,802 and 4,109 players from United Kingdom and Australia respectively.
  • Three of the four Grand Slams are conducted in these three countries, increasing visibility to their citizens.
  • The population of USA is very large, amounting to almost 337 million in 2023 (Worldometers.info 2023).
  • Asian countries like Japan, India, China and Korea, all with a commendable number of players.
  • Having many representatives from a country is however not a pure indicator of success.

From Sweet Success to Bitter Defeat Across Countries


  • Gain valuable insights into the performance and competitiveness of tennis players across different countries.
  • For example, we look at the United states, where from 167,361 matches, their players have won 52.1% of it.
  • The UK also shows an almost similar performance to that of USA, although their matches are far smaller in number at 49,446.
  • Noticeably out of 341 matches, Malaysians have won 67.7% of it, which is quite a high success rate.
  • Nevertheless, it must be taken with a grain of salt that the success rate could be higher due to the smaller number of representatives, as it can be seen that it converges towards a 50-50 percentage with more samples.
  • The pie charts can emphasise the diversity of player performances across countries, helping us to gain insights into their strategy dynamics and development in the tennis field if investigated further.

Evolution of the Ages


  • Help us to observe if there are any changes in tennis dynamics over the years, such as the age in which players enter or retire from the sports.
  • Interestingly, there is a huge plunge in average age of both genders, up until the time of around the Open Era implementation in 1968, where there is increase in average age.
  • It then decreases again in the 1980s before increasing in the 1990s onwards.
  • As technology and training methods evolve, more and more youngsters start to join the game.
  • Following the Open Era where even amateurs are allowed to compete alongside professionals, more players might have been interested to join tournaments.
  • An increase in average ages could also indicate longer career periods due to enhanced health and stronger physique of tennis players over time.
  • Female average age is higher than that of the males up to the point around the Open Era, where it is continuously below the male tennis players.
  • Women joining the game earlier then men could be explained by their earlier physical maturity.
  • They might also end their careers earlier as they start having a family and kids.
  • Serena William’s an all time great female tennis player who returned to Grand Slam tennis just four months after giving birth (PA SPORT 2017).
  • Understanding the dynamics could be important to shape training programs and career paths, or even for talent identification purposes.

Battle of the Courts


  • Further understand the characteristics and dynamics of gameplay on each surface.
  • At a glance, we see that the histograms are all slightly skewed to the right, meaning that there are more matches with shorter durations and a smaller portions of matches with high duration.
  • Noticeably, the mode of the grass court is 108 minutes for males and 72 minutes for females.
  • For the clay court, 369 matches are 108 minutes long but 368 of them are 132 minutes long, nearly 30 minutes more than that of grass. Even for the women the mode is 84 minutes.
  • Clay is the slowest surface among the three surfaces. As such, it slows down the speed of balls and increases height of bounce, making it excellent for players who love to use spin on the ball and play at the baseline. (Adib 2017).
  • Grass courts are the opposite in that the slippery surface generates high speed balls, appropriate for those who love to play at the net.
  • Hard courts are more neutral as can be seen that a high number of 751 matches took around 120 minutes for the men.

Dominant Hand at Play in the Power of the Swing


  • Aces are when a player earns a point for their serve.
  • Double faults are when the player fails to make a valid serve twice in a row.
  • One right-handed male player (John Isner) with 113 aces in one match alone (Wikipedia n.d.).
  • The median for both left and right handed males are the same.
  • For the women however, the third quartile and median of left handed players are slightly higher, indicating that left handed females have a more effective and powerful serve.
  • Once again, the distribution difference between left and right handed males are not pronounced for double faults.
  • The median for both dominant hand females are the same, but 75% of right handed women have up to 5 double faults, while left handed women have up to 4.
  • The results suggest that left handed women are more consistent in their serves.
  • The serving performance of male players are better than females as they have more aces and less double faults in general.

Height…Is It A Factor?


  • How does a player’s height influence his/her performance?
  • Observe if any correlations exist between winning percentage and height.
  • Initially, notice that most men are scattered around a height of nearly 180 cm to 190 cm.
  • There is not much of an observable significant correlation, indicating that the success of the male players could be attributed to a myriad of other factors such as speed, agility, endurance and so on.
  • Notice the plot of women height to be more scattered, especially around 170 cm to 180 cm.
  • A very slight positive correlation can be seen for the women.
  • This means that the height of women could help in winning matches as they have a better reach and longer strides.
  • Although, it is important to understand that it does not imply a causal relationship.

It’s Raining Grand Slams!


  • From here forth, we’ll be talking about the top tennis players of all time.
  • Appreciate their astounding successes and contributions to the sport. Their playing styles can also be studied to understand how it can impact their game.
  • With a remarkable record of 24 Grand Slams, Margaret Court takes the lead. However, it must be noted that 13 of those Grand Slams were obtained during the pre-Open Era.
  • Serena Williams is arguably one of the best tennis players of all time, with 23 Grand Slams in the Open Era (Nag 2023).
  • Her elder sister, Venus Williams also makes the list with 7 Grand Slams.
  • Familiar names such as Rafael Nadal, Roger Federer and Novak Djokovic, also known as the Big Three, are also observed.
  • Even a high-profile couple, namely Steffi Graf and Andre Agassi make the list with 22 and 8 Grand Slams respectively.

Winning is my Middle Name, or Is It?


  • The empty tiles in R128 could indicate that the players did not need to go through the round or had a “bye” in that they could proceed without competing.
  • On the other hand, there is an empty tile for Pete Sampras in the French Open finals due to the fact that he has never proceeded pass the Semifinals stage.
  • In the French Open, Rafael Nadal has a remarkable performance, 100% winning percentage for R128, R64, Semifinals and Finals, with the others all over 90%.
  • He is the King of Clay, with 13 French Open titles as of 2021. This would mean he is a good baseline player as he generates spins to wear down his opponents over time on this slow surface court.
  • However, his performance isn’t that good in the Australian Open, which is mostly dominated by Novak Djokovic.
  • Rafael Nadal is however one of the four people who owns the Career Golden Slam title, owning all four Grand Slam titles and one Olympic Gold Medal (Amir Rashid).
  • Looking at the Wimbledon performances, Pete Sampras does it exceptionally well, with a winning percentage of 100% in the Finals.
  • Many insights on the weaknesses and strengths of players across different court surfaces can be obtained.

Oh No…Haunting of the Bogey Players!


  • Apparently, researchers have gone as far as to test out the bogey phenomenon in professional men’s tennis, as they identify whether any bogey players who continually outperform the opposition they are playing against over an extended period of time exist or not (Bunker 2022).
  • Heere, we have a simple analysis to list out players that outperform the Top 10 players more than 50% of the time, so called bogey players.
  • Out of the Top 10 players, only 7 of them have bogey players.
  • We see that both Steffi Graf and Novak Djokovic have only one bogey player each.
  • Martina has beaten Steffi Graf 5 out of the 9 times they have met, leading to a winning percentage of 55.56%.
  • Notably, Nadal is the bogey player for both Djokovic and Federer, beating them 10 out of 16 times and 14 times respectively.
  • Serena Williams has lost more times than she has won to Samantha Stosur, Karolina Pliskova, Justine Henin and Jennifer Capriati.
  • In determining the top players’ weak spots, they could analyse the game more deeply to observe why they perform worse with certain players more than others.
  • They could pin-point their weaknesses and further adapt their gameplay according to the opponent they are going to meet.
  • The analysis further highlights the complexity of tennis rivalries, highlighting the diversity of playing styles by the top performers.

Errr…Let’s Not Think Too Much About Grand Slam Titles Shall We?


  • Instead of the most Grand Slam titles, we look at the best performers who have never actually won a Grand Slam.
  • Slight positive correlation in that the winning percentage increases as the number of titles won increases.
  • The bubbles are also bigger on the left hand side, indicating worst ranks around this area.
  • According to the plot, David Ferrer is one of the best players existing to have never won a Grand Slam. He has an impressive collection of 27 titles and a quite high winning percentage of 66.13% over the 1119 matches he has played. He does not simply win by chance as his winning percentage is much higher than 50%. This has even been confirmed by various sources, one of them being EuroWeekly News (2022).
  • For the women on the other hand, we have Agnieszka Radwanska who has won 20 titles, with a winning percentage of 68.55% over 795 matches.
  • Juan Manuel Cerundolo attains a winning percentage of 100%, but his highest rank is only 335 with one title to his name.
  • This analysis is important for us to recognise talent and shows that tennis success extends beyond only winning Grand Slam tournaments.
  • It also serves as an inspiration to youngsters that great heights can be reached despite not achieving the ultimate goal of winning a Grand Slam title.

Conclusion

In a nutshell, a myriad of important insights have been obtained throughout this documentary. As such, I will highlight the main points to jog our memory a little:

  1. The United States, followed by United Kingdom and Australia have the most tennis player representatives.

  2. As the number of tennis players a country has increases, the winning and losing percentages of players in all tournaments converge towards a 50-50 percentage.

  3. The average age of tennis players fluctuate through different periods of time. Female average age is continuously below the male tennis players after the Open Era.

  4. The match duration on clay courts is the highest, while that of grass courts is the lowest.

  5. The serving performance is not that different between left and right handed males, but is slightly better for left handed females than right handed females.

  6. Height does not influence male players’ winning percentage. However, there is a slight positive correlation between female height and winning percentage.

  7. Serena Williams has the most Grand Slams in the Open Era as of 2021.

  8. The heatmap of winning percentages for the top players across each round of Grand Slams can provide information on the weaknesses and strengths of players across different court surfaces.

  9. Even top performers have bogey players in that they would need to adapt their gameplay accordingly.

  10. David Ferrer is one of the best players existing to have never won a Grand Slam.

For tennis enthusiasts, looking at commentary and analysis through visuals is like icing on the cake. Fans can benefit from and gain a deeper understanding of the game if they are provided with statistical insights and data-driven storytelling, hopefully like the ones provided in this documentary. Using data analysis, tennis fans can have a greater understanding of the sport by learning about the intricacies of player performance, tournament dynamics and different characteristics or aspects influencing the game. It can also be used to make the sport more approachable and interesting to viewers, increasing their engagement as key statistics and crucial trends are broadcasted in a simple storytelling manner.

On a higher level, coaches, players and relevant parties can benefit from analysing player characteristics and statistics, performance metrics and match attributes to pinpoint areas for development, thus formulating winning strategies. Training schedules, strategies and game results can all benefit from more rigorous data analysis. Not only does analysis on tennis enrich the entertainment industry, but it also enhances player performance and drives advancement in the field too!